NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Board 408: Toward Building a Human-Computer Coding Partnership: Using Machine Learning to Analyze Short-Answer Explanations to Conceptually Challenging Questions

https://doi.org/10.18260/1-2--46996

Auby, Harpreet; Shivagunde, Namrata; Rumshisky, Anna; Koretsky, Milo (June 2024, ASEE Conferences)

Full Text Available
Using Machine Learning to Analyze Short-Answer Responses to Conceptually Challenging Chemical Engineering Thermodynamics Questions

https://doi.org/10.18260/1-2--48236

Auby, Harpreet; Shivagunde, Namrata; Rumshisky, Anna; Koretsky, Milo (June 2024, ASEE Conferences)

Full Text Available
Larger Probes Tell a Different Story: Extending Psycholinguistic Datasets Via In-Context Learning

https://doi.org/10.18653/v1/2023.emnlp-main.130

Shivagunde, Namrata; Lialin, Vladislav; Rumshisky, Anna (December 2023, Association for Computational Linguistics)

Language model probing is often used to test specific capabilities of models. However, conclusions from such studies may be limited when the probing benchmarks are small and lack statistical power. In this work, we introduce new, larger datasets for negation (NEG-1500-SIMP) and role reversal (ROLE-1500) inspired by psycholinguistic studies. We dramatically extend existing NEG-136 and ROLE-88 benchmarks using GPT3, increasing their size from 18 and 44 sentence pairs to 750 each. We also create another version of extended negation dataset (NEG-1500-SIMP-TEMP), created using template-based generation. It consists of 770 sentence pairs. We evaluate 22 models on the extended datasets, seeing model performance dip 20-57% compared to the original smaller benchmarks. We observe high levels of negation sensitivity in models like BERT and ALBERT demonstrating that previous findings might have been skewed due to smaller test sets. Finally, we observe that while GPT3 has generated all the examples in ROLE-1500 is only able to solve 24.6% of them during probing. The datasets and code are available on Github.
more » « less
Full Text Available
WIP: Using Machine Learning to Automate Coding of Student Explanations to Challenging Mechanics Concept Questions

Auby, Harpreet; Shivagunde, Namrata; Rumshisky, Anna; Koretsky, Milo (June 2022, ASEE 2022 Annual Conference)

This work-in-progress paper presents a joint effort by engineering education and machine learning researchers to develop automated methods for analyzing student responses to challenging conceptual questions in mechanics. These open-ended questions, which emphasize understanding of physical principles rather than calculations, are widely used in large STEM classes to support active learning strategies that have been shown to improve student outcomes. Despite their benefits, written justifications are not commonly used, largely because evaluating them is time-consuming for both instructors and researchers. This study explores the potential of large pre-trained generative sequence-to-sequence language models to streamline the analysis and coding of these student responses.
more » « less
Full Text Available
Down and Across: Introducing Crossword-Solving as a New NLP Benchmark

https://doi.org/10.18653/v1/2022.acl-long.189

Kulshreshtha, Saurabh; Kovaleva, Olga; Shivagunde, Namrata; Rumshisky, Anna (April 2022, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers))

Solving crossword puzzles requires diverse reasoning capabilities, access to a vast amount of knowledge about language and the world, and the ability to satisfy the constraints imposed by the structure of the puzzle. In this work, we introduce solving crossword puzzles as a new natural language understanding task. We release a corpus of crossword puzzles collected from the New York Times daily crossword spanning 25 years and comprised of a total of around nine thousand puzzles. These puzzles include a diverse set of clues: historic, factual, word meaning, synonyms/antonyms, fill-in-the-blank, abbreviations, prefixes/suffixes, wordplay, and cross-lingual, as well as clues that depend on the answers to other clues. We separately release the clue-answer pairs from these puzzles as an open-domain question answering dataset containing over half a million unique clue-answer pairs. For the question answering task, our baselines include several sequence-to-sequence and retrieval-based generative models. We also introduce a non-parametric constraint satisfaction baseline for solving the entire crossword puzzle. Finally, we propose an evaluation framework which consists of several complementary performance metrics.
more » « less
Full Text Available
Life after BERT: What do Other Muppets Understand about Language?

https://doi.org/10.18653/v1/2022.acl-long.227

Lialin, Vladislav; Zhao, Kevin; Shivagunde, Namrata; Rumshisky, Anna (April 2022, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers))

Full Text Available
BERT Busters: Outlier Dimensions that Disrupt Transformers

https://doi.org/10.18653/v1/2021.findings-acl.300

Kovaleva, Olga; Kulshreshtha, Saurabh; Rogers, Anna; Rumshisky, Anna (August 2021, Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021)

Full Text Available
A Primer in BERTology: What We Know About How BERT Works

https://doi.org/https://doi.org/10.1162/tacl_a_00349

Rogers, Anna; Kovaleva, Olga; Rumshisky, Anna (December 2020, Transactions of the Association for Computational Linguistics)
Das, Dipanjas (Ed.)
Transformer-based models have pushed state of the art in many areas of NLP, but our understanding of what is behind their success is still limited. This paper is the first survey of over 150 studies of the popular BERT model. We review the current state of knowledge about how BERT works, what kind of information it learns and how it is represented, common modifications to its training objectives and architecture, the overparameterization issue, and approaches to compression. We then outline directions for future research.
more » « less
Full Text Available
When BERT Plays the Lottery, All Tickets Are Winning

https://doi.org/10.18653/v1/2020.emnlp-main.259

Prasanna, Sai; Rogers, Anna; Rumshisky, Anna (November 2020, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP))
null (Ed.)
Large Transformer-based models were shown to be reducible to a smaller number of self-attention heads and layers. We consider this phenomenon from the perspective of the lottery ticket hypothesis, using both structured and magnitude pruning. For fine-tuned BERT, we show that (a) it is possible to find subnetworks achieving performance that is comparable with that of the full model, and (b) similarly-sized subnetworks sampled from the rest of the model perform worse. Strikingly, with structured pruning even the worst possible subnetworks remain highly trainable, indicating that most pre-trained BERT weights are potentially useful. We also study the “good” subnetworks to see if their success can be attributed to superior linguistic knowledge, but find them unstable, and not explained by meaningful self-attention patterns.
more » « less
Full Text Available
Revealing the Dark Secrets of BERT

https://doi.org/10.18653/v1/D19-1445

Kovaleva, Olga; Romanov, Alexey; Rogers, Anna; Rumshisky, Anna (January 2019, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP))

Full Text Available

« Prev Next »

Search for: All records